AI Considerations 2024-05-01 - NISHIO Hirokazu's Scrapbox (Auto-translated from Japanese)

AI Considerations 2024-05-01

prev AI Considerations 2024-04-27

gpt.iconYou are considering ways to effectively utilize scanned data from books. In particular, you are exploring ways to deconstruct and reconstruct text in the hope that this will lead to a deeper understanding of content and new perspectives. We have introduced a new method called "Fractal Summary" which we believe is an effective means of reconstructing textual data in a more usable way. We also mention an approach that aims for a more comprehensive understanding by extracting and comparing parts of books by different authors that are relevant to a particular topic. This method is a promising strategy for integrating information and generating new insights.

It also discusses plans to centralize data from individual books and multiple sources using a tool called Scrapbox. This tool could be useful for routine study, research, and presentation preparation. Ultimately, we conclude that it is important to evaluate how these tools and methods can contribute to efficient and effective knowledge intake and utilization.

nishio.iconWhat about Next Action?

gpt.iconThe next action would be to perform the following steps

Specific implementation of fractal summarization:

Develop a specific plan for how fractal summarization will be applied to existing scanned book data. This process will clearly define which techniques will be used and in what format the output will be obtained.

Organize and expand Scrapbox:

We will consider ways to organize the pages in Scrapbox and integrate new summarization technologies. In particular, we will focus on enhancements to improve work efficiency, such as the ability to automatically insert a summary for every 50 pages.

Development of a cross-book analysis tool:

Develop a tool or method for extracting and comparing content related to the same topic from different books. This tool is expected to provide deeper insights by facilitating thematic comparisons and analysis.

Test prototypes and collect feedback:

We test prototypes of the summary and analysis tools we develop and gather feedback from ourselves and other users. Based on this feedback, we will make improvements and evolve the tools to make them even easier to use.

Documentation and sharing:

Document the use of the tool and the findings obtained, and share them with other researchers and users as needed. This will facilitate knowledge sharing within the community and suggestions for tool improvements.

By proceeding with these steps, we expect to make effective use of scanned book data and deepen our knowledge.

Impressions:.

the other

---

claude.icon Disassemble and reconstruct

This summary is not good either.

---

nishio.icon I want to enable software systems to do the activity of "reading a book". What is happening when a human reads a book? How is software reading a book different from indexing a book and making it searchable?

(ruined about 3 times https://twitter.com/nishio/status/1785626337828597935)

gpt.iconIf we focus on the essence of the act of reading a book, it is the process of extracting, understanding, and interpreting information from the text. Considering specifically the basic approach for a software system to "read" a book, the following steps can be considered:

Text preprocessing:.

Convert text to digital format (e.g., extract text from a scanned book with OCR technology).

Perform data cleansing to remove typos and unnecessary formatting and standardize text.

Parsing:.

It breaks sentences into words and phrases and analyzes their grammatical structure. This includes part-of-speech tagging and dependency parsing.

Semantic Analysis:.

To analyze the meaning of a text, we extract the meaning of words and phrases and understand how they function in a sentence.

It analyzes the overall theme or argument of the text by ascertaining the information and intent of each sentence or paragraph.

Information Extraction and Knowledge Generation:.

Identify key information, characters, settings, plots, etc., and follow their relevance and progression.

It generates new knowledge and summaries from the text and provides the content in a form that is easily understood by the user.

Continuous learning and adaptation:.

Continuously train models to be flexible and responsive to texts of various genres and styles.

Through this process, software can not only read text, but "understand" its content and provide information in a way that is meaningful to the human reader.

nishio.iconWhat is knowledge generation?

gpt.iconKnowledge generation" refers to the process of creating new knowledge from given data and information. This process involves not only collecting and recording data, but also analyzing, understanding, and even drawing new insights and meaning from that data. In the context of natural language processing (NLP), knowledge generation from text involves the following steps:

Information Extraction: Identify and extract specific information (person names, places, dates, facts, etc.) from text.

[Identifying Relevance: Recognize relationships among the extracted information and evaluate which information is important.

Pattern recognition: analyze large amounts of text data to identify patterns and trends.

Reasoning: Drawing new conclusions or hypotheses from existing information.

Summarize and integrate: summarize the main points of the entire text and integrate new perspectives and insights.

Thus, knowledge generation is an activity that does not process data as mere information, but rather analyzes it and interprets it in a larger context to form new understandings. For example, when software reads a history book, knowledge generation is not simply listing facts, but deriving historical patterns and causal relationships from those facts and proposing new historical views based on them.

nishio.iconWhat exactly does identifying relevance mean?

gpt.iconRelevance identification is the process of understanding the relationships between the information provided in a text and identifying which information is related to other information and how. This process is important for a deeper understanding of the textual content. Below are the specific steps in how the identification of relevance is done:

Entity identification: extracts important entities (people, places, organizations, dates, etc.) from the text.

Relationship Extraction: To identify relationships between entities, verbs, prepositions, and conjunctions are analyzed. For example, the sentence "John lives in London" identifies a "place of residence" relationship between "John" and "London".

Use of Context: Evaluate the importance of an entity or its relationship depending on the context in which it appears in the text. For example, if a character in a story repeatedly mentions a particular place, that place may have an important role in the story.

Causal Identification: Identifies causal relationships between events that occur between entities. For example, the statement "The party was canceled after it rained" can be interpreted as meaning that rain affected the outcome of the party being canceled.

Pattern Identification: Analyzes large amounts of data to find general patterns of association between entities. This allows us to quickly identify relationships between information using known patterns, even when encountering new text.

Thus, identifying relevance is key to gaining a deeper understanding of the meaning of a text by understanding how the information in a text is connected as a whole, rather than considering it in isolation. This process is central to natural language processing techniques, particularly in information extraction and knowledge graph construction.

next Is the knowledge graph an incorrect abstraction?

---

This page is auto-translated from /nishio/AIの考察2024-05-01 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.